Extended Language Models for XML Element Retrieval
Identifieur interne : 000358 ( Main/Exploration ); précédent : 000357; suivant : 000359Extended Language Models for XML Element Retrieval
Auteurs : Rongmei Li [Pays-Bas] ; Theo Van Der Weide [Pays-Bas]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2011.
English descriptors
- Teeft :
- Baseline, Castitle, Context task, Dirichlet priors, Document retrieval, Element retrieval, Full document retrieval, Indri search engine, Inex, Language model, Language models, Magp, Main content, Measure score, Prec, Query, Query model, Query term generation, Ranking function, Relevant characters, Retrieval, Retrieval model, Retrieval tasks, Roman architecture, Simplest language model, Snippet, Snippet retrieval, Weide table, Wikipedia.
Abstract
Abstract: In this paper we describe our participation in the INEX 2010 ad-hoc track. We participated in three retrieval tasks (restricted focused task, relevant-in-context, restricted relevant-in-context) and report our findings based on a single set of measure for all tasks. In this year’s participation, we evaluate the performance of the standard language model that is more focused on a fixed number of relevant characters than on relevant paragraphs. Our findings are: 1) the simplest language model for document retrieval performs relatively well in the restricted focused task when using a fixed offset that is close to the average character distance from the beginning of a document to its main content; 2) a good result of document ranking does improve the performance of snippet retrieval; 3) stemming and stopword removal can further boost performance.
Url:
DOI: 10.1007/978-3-642-23577-1_8
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001030
- to stream Istex, to step Curation: 000F51
- to stream Istex, to step Checkpoint: 000219
- to stream Main, to step Merge: 000358
- to stream Main, to step Curation: 000358
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Extended Language Models for XML Element Retrieval</title>
<author><name sortKey="Li, Rongmei" sort="Li, Rongmei" uniqKey="Li R" first="Rongmei" last="Li">Rongmei Li</name>
</author>
<author><name sortKey="Van Der Weide, Theo" sort="Van Der Weide, Theo" uniqKey="Van Der Weide T" first="Theo" last="Van Der Weide">Theo Van Der Weide</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-23577-1_8</idno>
<idno type="url">https://api.istex.fr/document/9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001030</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001030</idno>
<idno type="wicri:Area/Istex/Curation">000F51</idno>
<idno type="wicri:Area/Istex/Checkpoint">000219</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000219</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Li R:extended:language:models</idno>
<idno type="wicri:Area/Main/Merge">000358</idno>
<idno type="wicri:Area/Main/Curation">000358</idno>
<idno type="wicri:Area/Main/Exploration">000358</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Extended Language Models for XML Element Retrieval</title>
<author><name sortKey="Li, Rongmei" sort="Li, Rongmei" uniqKey="Li R" first="Rongmei" last="Li">Rongmei Li</name>
<affiliation wicri:level="3"><country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Radboud University, Nijmegen</wicri:regionArea>
<placeName><settlement type="city">Nimègue</settlement>
<region type="province" nuts="2">Gueldre</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Van Der Weide, Theo" sort="Van Der Weide, Theo" uniqKey="Van Der Weide T" first="Theo" last="Van Der Weide">Theo Van Der Weide</name>
<affiliation wicri:level="3"><country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Radboud University, Nijmegen</wicri:regionArea>
<placeName><settlement type="city">Nimègue</settlement>
<region type="province" nuts="2">Gueldre</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Baseline</term>
<term>Castitle</term>
<term>Context task</term>
<term>Dirichlet priors</term>
<term>Document retrieval</term>
<term>Element retrieval</term>
<term>Full document retrieval</term>
<term>Indri search engine</term>
<term>Inex</term>
<term>Language model</term>
<term>Language models</term>
<term>Magp</term>
<term>Main content</term>
<term>Measure score</term>
<term>Prec</term>
<term>Query</term>
<term>Query model</term>
<term>Query term generation</term>
<term>Ranking function</term>
<term>Relevant characters</term>
<term>Retrieval</term>
<term>Retrieval model</term>
<term>Retrieval tasks</term>
<term>Roman architecture</term>
<term>Simplest language model</term>
<term>Snippet</term>
<term>Snippet retrieval</term>
<term>Weide table</term>
<term>Wikipedia</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper we describe our participation in the INEX 2010 ad-hoc track. We participated in three retrieval tasks (restricted focused task, relevant-in-context, restricted relevant-in-context) and report our findings based on a single set of measure for all tasks. In this year’s participation, we evaluate the performance of the standard language model that is more focused on a fixed number of relevant characters than on relevant paragraphs. Our findings are: 1) the simplest language model for document retrieval performs relatively well in the restricted focused task when using a fixed offset that is close to the average character distance from the beginning of a document to its main content; 2) a good result of document ranking does improve the performance of snippet retrieval; 3) stemming and stopword removal can further boost performance.</div>
</front>
</TEI>
<affiliations><list><country><li>Pays-Bas</li>
</country>
<region><li>Gueldre</li>
</region>
<settlement><li>Nimègue</li>
</settlement>
</list>
<tree><country name="Pays-Bas"><region name="Gueldre"><name sortKey="Li, Rongmei" sort="Li, Rongmei" uniqKey="Li R" first="Rongmei" last="Li">Rongmei Li</name>
</region>
<name sortKey="Van Der Weide, Theo" sort="Van Der Weide, Theo" uniqKey="Van Der Weide T" first="Theo" last="Van Der Weide">Theo Van Der Weide</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000358 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000358 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sarre |area= MusicSarreV3 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4 |texte= Extended Language Models for XML Element Retrieval }}
This area was generated with Dilib version V0.6.33. |